Municipality of Koper
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
Neural Network-Guided Symbolic Regression for Interpretable Descriptor Discovery in Perovskite Catalysts
Xian, Yeming, Wang, Xiaoming, Yan, Yanfa
Understanding and predicting the activity of oxide perovskite catalysts for the oxygen evolution reaction (OER) requires descriptors that are both accurate and physically interpretable. While symbolic regression (SR) offers a path to discover such formulas, its performance degrades with high-dimensional inputs and small datasets. We present a two-phase framework that combines neural networks (NN), feature importance analysis, and symbolic regression (SR) to discover interpretable descriptors for OER activity in oxide perovskites. In Phase I, using a small dataset and seven structural features, we reproduce and improve the known μ/t descriptor by engineering composite features and applying symbolic regression, achieving training and validation MAEs of 22.8 and 20.8 meV, respectively. In Phase II, we expand to 164 features, reduce dimensionality, and identify LUMO energy as a key electronic descriptor. A final formula using μ/t, μ/RA, and LUMO energy achieves improved accuracy (training and validation MAEs of 22.1 and 20.6 meV) with strong physical interpretability. Our results demonstrate that NN-guided symbolic regression enables accurate, interpretable, and physically meaningful descriptor discovery in data-scarce regimes, indicating interpretability need not sacrifice accuracy for materials informatics.
- North America > United States (0.28)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- Materials > Chemicals > Specialty Chemicals (0.62)
- Energy > Renewable (0.46)
The Representational Alignment between Humans and Language Models is implicitly driven by a Concreteness Effect
Iaia, Cosimo, Choksi, Bhavin, Wiebers, Emily, Roig, Gemma, Fiebach, Christian J.
The nouns of our language refer to either concrete entities (like a table) or abstract concepts (like justice or love), and cognitive psychology has established that concreteness influences how words are processed. Accordingly, understanding how concreteness is represented in our mind and brain is a central question in psychology, neuroscience, and computational linguistics. While the advent of powerful language models has allowed for quantitative inquiries into the nature of semantic representations, it remains largely underexplored how they represent concreteness. Here, we used behavioral judgments to estimate semantic distances implicitly used by humans, for a set of carefully selected abstract and concrete nouns. Using Representational Similarity Analysis, we find that the implicit representational space of participants and the semantic representations of language models are significantly aligned. We also find that both representational spaces are implicitly aligned to an explicit representation of concreteness, which was obtained from our participants using an additional concreteness rating task. Importantly, using ablation experiments, we demonstrate that the human-to-model alignment is substantially driven by concreteness, but not by other important word characteristics established in psycholinguistics. These results indicate that humans and language models converge on the concreteness dimension, but not on other dimensions.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.05)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
- (6 more...)
Anti-aliasing of neural distortion effects via model fine tuning
Carson, Alistair, Wright, Alec, Bilbao, Stefan
Neural networks have become ubiquitous with guitar distortion effects modelling in recent years. Despite their ability to yield perceptually convincing models, they are susceptible to frequency aliasing when driven by high frequency and high gain inputs. Nonlinear activation functions create both the desired harmonic distortion and unwanted aliasing distortion as the bandwidth of the signal is expanded beyond the Nyquist frequency. Here, we present a method for reducing aliasing in neural models via a teacher-student fine tuning approach, where the teacher is a pre-trained model with its weights frozen, and the student is a copy of this with learnable parameters. The student is fine-tuned against an aliasing-free dataset generated by passing sinusoids through the original model and removing non-harmonic components from the output spectra. Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the majority of our case studies, the reduction in aliasing was greater than that achieved by two times oversampling. One side-effect of the proposed method is that harmonic distortion components are also affected. This adverse effect was found to be model-dependent, with the LSTM models giving the best balance between anti-aliasing and preserving the perceived similarity to an analog reference device.
- Europe > Italy > Marche > Ancona Province > Ancona (0.05)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
- (8 more...)
Connecting the Persian-speaking World through Transliteration
Merchant, Rayyan, Ramarao, Akhilesh Kakolu, Tang, Kevin
Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Iranian and Afghan texts written in the Perso-Arabic script. As the vast majority of Persian text on the Internet is written in Perso-Arabic, monolingual Tajik speakers are unable to interface with the Internet in any meaningful way. This paper presents a transformer-based G2P approach to Tajik-Farsi transliteration, achieving chrF++ scores of 58.70 (Farsi to Tajik) and 74.20 (Tajik to Farsi) on novel digraphic datasets, setting a comparable baseline metric for future work. Our results also demonstrate the non-trivial difficulty of this task in both directions. We also provide an overview of the differences between the two scripts and the challenges they present, so as to aid future efforts in Tajik-Farsi transliteration. Keywords: Persian, Tajik, Transliteration, Orthography, Computational Linguistics 1 Introduction Tajik Persian (henceforth, Tajik) is the formal variety of Modern Persian spoken in Tajikistan. As such, it retains an extremely high level of mutual intelligibility with formal Persian as spoken in Iran and Afghanistan (henceforth referred to as Farsi). Unlike these two countries which use the centuries-old Perso-Arabic script, Tajikistan uses the relatively new Tajik-Cyrillic script due to Tajikistan's Soviet heritage (Perry 2005). While proposals have been made to shift the script back to Perso-Arabic, any significant shift will likely not occur in the near future, with Tajikistan's former Minister of Culture stating in 2008 that "...some 90-95% of Tajikistan's population is not familiar with Arabic script..." 1 (Ghufronov 2008).
- Asia > Tajikistan (1.00)
- Asia > Afghanistan (0.24)
- Asia > Middle East > Iran (0.24)
- (13 more...)
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance
Large Language Models (LLMs) have made significant strides in natural language generation but often face challenges in tasks requiring precise calculations and structural analysis. This paper investigates the performance of state-of-the-art LLMs on language complexity measurement tasks, through the computation of the LIX readability metric and Average Dependency Distance (ADD). Using Swedish high school and university-level essays, we evaluate the models' abilities to compute LIX scores and perform dependency parsing, comparing their results to established ground truths. Our findings reveal that while all models demonstrate some capacity for these tasks, ChatGPT-o1-mini performs most consistently, achieving the highest accuracy in both LIX computation and dependency parsing. Additionally, we observe a strong significant correlation -0.875 p 0.026 (N=6) between the models' accuracy in computing LIX and their overall performance on the Massive Multitask Language Understanding (MMLU) benchmark. These results suggest that language complexity measurement abilities can serve as a noisy zero-shot proxies for assessing the general capabilities of LLMs, providing a practical method for model evaluation without the need for extensive benchmarking datasets.
- Europe > Sweden > Stockholm > Stockholm (0.05)
- Africa > Nigeria (0.05)
- South America > Argentina (0.04)
- (3 more...)
Text-to-Image Generation for Vocabulary Learning Using the Keyword Method
Attygalle, Nuwan T., Kljun, Matjaž, Quigley, Aaron, Pucihar, Klen čOpič, Grubert, Jens, Biener, Verena, Leiva, Luis A., Yoneyama, Juri, Toniolo, Alice, Miguel, Angela, Kato, Hirokazu, Weerasinghe, Maheshya
The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (24 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Education (1.00)
- Leisure & Entertainment (0.93)
- Health & Medicine > Consumer Health (0.87)